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Blind source separation 
in the presence of weak sources 

Abstract 

We investigate the information processing of a linear mixture of independent sources 
of different magnitudes. In particular we consider the case where a number m of the 
sources can be considered as "strong" as compared to the other ones, the "weak" 
sources. We find that it is preferable to perform blind source separation in the space 
spanned by the strong sources, and that this can be easily done by first projecting 
the signal onto the m largest principal components. We illustrate the analytical results 
with numerical simulations. 
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1 Introduction 



During the recent years many studies have been devoted to the study of Blind Source 
Separation (BSS) and more generally of Independent Component Analysis (ICA) (see e.g. 
0, ||, ||, 0). Within the standard framework one assumes a multidimensional measured sig- 
nal to result from a linear mixture of statistically independent components, or "sources". 
In most cases one makes the optimistic hypotheses that the number of sources is equal to 
the dimension of the signal (the number of captors), and that the unknown mixture matrix 
is invertible. The goal of BSS is then to compute an estimate of the inverse of the mixture 
matrix in order to extract from the signal the independent components. 

In the present paper we study the effect of having sources with different "strengths" 
when performing BSS. After giving a proper definition of the strength of a source, the main 
purpose of our study is to relate the strength of a source to its contribution to the information 
conveyed by the processing system about the signal, and to consider with more details the 
case where some of the sources are very weak compared to the others. We will show that 
in that case it is worthwhile to project the data onto the space generated by the strong 
sources in order to extract meaningful information and to avoid numerical problems. The 
contributions to the (projected) signal from the weak sources can then be considered as noise 
terms added to the linear mixture of strong sources. Since the sources are independent, this 
"noise" is thus independent of the "pure" signal (the part due to the strong sources). 

The paper is organized as follows. In section [| we introduce the model and give a precise 
definition to the strength of a source. In section |3] we compute Shannon information quantities 
from which we characterize how each source contributes to the information conveyed by the 
data and by the output of the processing network. We then discuss the case of a linear 
mixture of iV independent sources with N — m "weak" sources and m "strong" sources. The 
results of section ||| show that in such a case it would be preferable to be able to work in 
the m dimensional space spanned by the strong sources. We show in section |] that, with a 
good approximation, this is simply done by projecting the data onto the m largest principal 
components. As a result one can perform BSS in the m-dimensional space where one is 
dealing with a m-dimensional linear mixture corrupted by a weak input noise. In section |5] 
we study, at first non trivial order in the noise strength, the expected performance in the 
estimation of the m strong sources. Eventually in section ^| we present numerical simulations. 



2 The Model 

We consider the information processing of a signal which is a N- dimensional linear mixture 
of TV independent sources. At each time t one observes S(£) = {Sj(t),j = 1, N} which 
can be written in term of the unknown sources s(t) = {s a (t), a = 1, N} as: 

N 

S 3 = Y,M ja s a , J = l,-..,N, (1) 

Q = l 

where M = {Mj a ,j = 1, N, a = 1,..., N} is the mixture matrix assumed to be invertible. 
As it is well known, and easily seen from the above equation, it is not possible to distinguish 
between the mixture of s with the matrix M from the mixture of s' = PDs with the matrix 
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M' = MD *P 1 where D is an arbitrary diagonal matrix with non zero diagonal elements, 
and P an arbitrary permutation of iV indices. If we decide to consider both normalized sources 
and normalized mixture matrices, we are left with a diagonal matrix D which defines the 
"strengths" of the sources. More precisely we write 



N 



J 



N 



a=l 



assuming zero mean and unit variance for every source: 



< s a >= 0, 



< st > 



a 



N, 



(2) 



(3) 



where < . > denotes the average with respect to the (unknown) sources probability distri- 
butions, 

P(s) = II A*( s <*), ( 4 ) 

a 

and with M the normalized mixture matrix. The normalization can be chosen in different 
ways, and two of them are of particular interest for what follows. The simplest one is, for 
each a, 

N 

= E«*) 2 = 1- (5) 
j=l 

The second one is a normalization on the inverse of the mixture matrix: 



M T M 



1 T — 1 

M M 



N 



1. 



0=1 



(6) 



Once a particular normalization, such as fl5|) or (|), is chosen, the parameters r\ a in (|[) are 
well defined and can be understood as the relative strengths of the sources. 



3 Information processing in the presence of inhomoge- 
neous sources 

Since the mixture matrix is assumed to be invertible, it is in principle possible to compute 
an estimate of it. This can be done with any one of the known blind source separation (BSS) 
algorithms (see e.g. || ||, [| |Tl|). As a result one obtains an estimate of the inverse of the 
mixture matrix, which in our notations can be written as 



1 



M 



a.j 



(7) 



This shows that it will be dominated by the smallest rfs, and numerical instabilities or 
overflows may occur if some of them are very small. In many approaches to BSS whitening 
of the data is first performed. The whitened data are then an orthogonal mixture of sources, 
so that after this preprocessing one has sources of equal strengths. But this preprocessing 
requires a multiplication by the inverse of the eigenvalues, and this is subject to the same 
numerical problems as with the computation of the inverse of the mixture matrix: as we will 
see in section HI small values of rj leads to the existence of small eigenvalues. 



4 



3.1 Information content of the data 



Let us now compute the amount of information conveyed by the data, S, about the sources, 
that is the mutual information @ I(S, s). To do so we consider 

N 

Sj = £ M Ja V a S a + V j% J = 1, N. (8) 

a=l 

where v = = 1, N} is a vanishing additive noise, < Vj >= 0, < Uj >= b dj^ with 

b — > 0. Then /(S, s) is a constant (that is a quantity that depends on b alone) plus the data 
entropy. Since the mixture matrix is invertible, we have 

J(S, s) = Const. + ln|detM| + ^2\nr) Q ~~ 53 / dh a p a (h a ) lnp a (h a ). (9) 

The last term in the above expression is the sum of the source entropies. One should re- 
member that the s's are the normalized sources, < s 2 a >= 1. This shows that each source 
contributes to the information by a combination of its strength and its entropy: the strength 
term favors strong sources, whereas the entropy term favors the sources with a probability 
distribution function (p.d.f.) close to Gaussian. The entropy terms, however, are bounded: 
the entropy of a source cannot exceeds the one of a Gaussian with same variance, that is 

- J dh a p a (h a ) In p a (h„) < ^ln2ne . (10) 

Hence the information can be easily dominated by the strength terms, which can be arbi- 
trarily large. 

It is known that for performing BSS perfect knowledge of the sources distribution is not 
necessary, and working on the cumulants of order 2 and 3 or 4 is sufficient (see e.g. || [Tl"|j). 



We can thus analyze the result Eq. @ by making a close-to- Gaussian approximation || |TT 
If we assume the sources to have non zero third order cumulants, 

aL 3) =< 4 >c (ii) 

we replace the source distribution p a by 

Ws „ H ^( 1 + A <3)£^i)). (12) 

The distribution p a has the same three first moments as the true distribution p a |TJ. 

In the case of a symmetric non-Gaussian distribution, the third order cumulants are zero 
and one has then to take into account non zero fourth order cumulants. It is a straightforward 
exercise to perform the same analysis as below in that case. For simplicity in this paper we 
will consider only the case of non symmetric distributions. 

Within this approximation, Eq. (|T2]) , the mutual information (|9]) reads: 

N 1 

J(S, s) = Const. + ln|detM| + £ lnr? a + — In 2ne -J2 < si >l . (13) 



a 
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From the above expression the most important source are those for which the quantity 

< 4 >c - ln?7 Q (14) 

is the smallest. 

We consider now the information that will be conveyed by a network processing the data, 
and ask for the contribution to this information by each source when the network performs 
BSS. 



3.2 Characterization from infomax 



The infomax criterion [K| will allow us to get some more insight onto the link between 
the sources strengths and the amount of information that can be extracted from the data. 

We consider the information processing of the signal by a nonlinear network, and we are 
interested in computing the mutual information /(V, S) between the input S and the output 
V = {Vi,i = 1,...,N} of the network. Since the signal is a linear mixture, the relevant 
architecture is a linear processing followed by a (possibly) nonlinear transfer function which 
may differ from neuron to neuron: 

Vi = fi{hi) + Vi (15) 
>»i = £ Jij ( Sj + i/j ) , (16) 

3 

where u = = 1, N} and v = {z/j, i = 1, N} are additive input and output noise, 

respectively, with < v >= 0, < v >= 0, < i$ v®, >= b° Sj 7 f, < Vi vy >= b 8^. The can 
be viewed as synaptic efficacies and the h^s as post-synaptic potentials (PSP). As explained 
in the previous section, the noise has to be introduced in order to have a nontrivial mutual 
information, and we take the limit < b° « b « 1. For strictly zero input noise, b° = 0, 
in the limit b —>■ the mutual information is up to a constant equal to the output entropy. 



As shown in [0] its maximization over the choice of both J and the transfer functions f^s 
leads to BSS. One can then derive practical algorithms for performing BSS 0. In this limit 
of b° = all the sources play the same role, that is the maximum of the mutual information 
is independent of the individual sources properties as well as of the mixture matrix. When 
one takes into account a non zero input noise, then at first non trivial order in y one sees 
that the input noise introduces a scale which breaks this invariance. More precisely, at first 
order in y the mutual information /(V, S) can be written (see [IIJ for details): 



b° N r 

j(v,s) = J (v,s) - _£r« / dhMhi)f- 2 , (17) 

i=l 



where Iq(V, S) is the value at b° = 0, 

^(h) 



J (V,S) = Const. - [dh^ln-jP- 



and y Ta is the variance of the noise on the PSP hf 



r« = JJ T .. . (19) 



6 



Finally, ip(h) is the probability distribution of h induced by the sources input distribution, 
and ipi(hi) the marginal distribution of the PSP hi. At a given J, optimizing with respect to 
the choice of transfer functions gives 

b° r i 
fl(hi) = Uh t ) { 1 + j T u [< $ > -tf(hi)} } (20) 

with < ipf >= J dhitf>i(hi)i()f(hi) = J dhi^ilhi) 3 . We now optimize over J. At zeroth order 
the optimum is reached for J = M _1 (up to an arbitrary permutation), so that we write 

b° 

W = JM = 1 N + -W\ (21) 
b 

where In is the N x N identity matrix. Expanding the mutual information at first order in y 
one finds that there is no contribution from W 1 to this order. Hence the mutual information 
at first order in y is given by Eq. ( |TTD at J = M _1 , with f- given by ( p0[) in which we set 
ipi — pi. This gives 

iV 



J(V,S) = Const. - ^E r - / ds a [ Pa (s a )] 3 (22) 



6° 

' a=l 

with 

-Ia/tT-1 



M _1 M 



(23) 



One sees that the term depending on M is what appears in the normalization (|6]) of the 
mixture matrix. Hence if one chooses this particular normalization (|j) in order to define the 
strengths rj a of the sources, one can rewrite 

b° N 1 

/(V, S) = Const. - - £ - < pi > (24) 

with < p 2 a >= J ds a [p a {s a )} 3 - The above expression shows how each source a contributes to 
the mutual information in term of its strength r\ a and its pdf p a . 
Within the close-to-Gaussian approximation (|12"D one gets 

la N 1 

J(V,S) = Const - <4>a _ (25) 

a=l n a 

Hence the sources which contribute the most to the conveyed information are those for which 
the quantity 

1 

is the smallest. One should remember that, here, n a is given by 



S a s< 4 >\ — (26) 



4=S([ M_1 0'- (27) 



a j=l 
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3.3 Discussion 



As already seen when computing the mutual information between the data and the sources, 
a source will contribute if it is strong and/or close to Gaussian. However the particular 
combination which appears here is different from the one we obtained in the previous section: 
here we have a multiplicative combination of strength and cumulant, whereas in (14) it was 
an additive combination. 

An important practical remark is that, if the third order cumulants are zero, the close-to- 
Gaussian approximation has to take into account the fourth order cumulants. Then, instead 
of (0) and ([26|) one gets similar expressions with the fourth order cumulants in place of the 
third order ones. 

The criterion (|26| ) can be used in different ways, depending on the particular application 
considered. The quantity £ a is zero for Gaussian sources, whatever their strengths. This is 
not surprising since the Shannon information is maximal for Gaussian distributions. However 
in many cases the Gaussian part of the signal is considered as "noise" , and the non Gaussian 
part is the "meaningful" part, the "true" signal. Hence mutual information can be used as 
a cost function in order to extract this noise, in particular when it is strong, which can then 
be subtracted from the input signal. In cases where one has distributions of similar shapes, 
(HI) suggests to use the strength as defined in ([27]) to order the sources and select the most 
relevant ones. 

To conclude the present section [3|, we see that the intuitive idea that weak sources can 
be considered as noise terms and cannot be estimated, can be quantified from various point 
of views. From the purely numerical aspect, the mixture matrix is close to be singular; 
the information content of the data, the amount of information conveyed by a processing 
channel, are seriously diminished by the presence of weak sources. From this analysis, it 
appears clearly that it would be preferable to be able to project the data onto the space 
spanned by the strong sources, in order to work in a space of smaller dimension with sources 
of similar strengths. In the next section we show that this is simply done by making use of 
the principal component analysis. 



4 Principal Component Analysis 

A standard approach in data processing consists in first performing the principal component 
analysis (PCA), and then projecting the data onto the eigenspace associated with the largest 
eigenvalues. In the present context of BSS, it is reasonable to expect the space spanned by 
the strong sources to be essentially the same as the one associated to the largest principal 
components. It is the purpose of this section to give a positive and more precise answer to 
this question. 

We consider the specific case where m source are strong, while N — m sources are weak. 
More precisely, choosing for later convenience the normalization (|3p, we assume 

Va ~ 0(1 = e°) for a = 1, m 

r] a ~0(e)fora = m + l,...,N, (28) 

where e is a small parameter, e << 1. This is equivalent to state that there is a gap in the 
spectrum of eigenvalues at the A m , with A m+ i << X m . 
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We assume that the reduced N x m mixture matrix M°, {Mj a = Mj a ,j = 1, N;a = 
1, ...,m} is of rank m, so that the (N x N) correlation matrix (the covariance of the input 
signal) C°, which would be obtained at e = 0, has m non zero eigenvalues. It is a standard 
exercise in perturbation theory |J to study the behavior of the eigenvalues and eigenvectors 
of a symmetric matrix, here the covariance matrix C of the inputs, at first non trivial order 
in the small parameter e. The eigenvalues have a smooth behavior with e: the m largest 
eigenvalues of C are, at first non trivial order, the m non zero eigenvalues of C° shifted by 
quantities of order e 2 , and the N — m smallest ones are of order e 2 . However the eigenvectors 
are very sensitive to small variations of e - this is related to the fact that the mixture matrix 
M is closed to be singular for small e. More precisely, one gets the following results. 

One can write C as 

C = C° + e 2 C\ (29) 

where C° is the correlation of the inputs that would be obtained without the weak sources 
(e = 0), and e 2 C x contains all the contributions from the weak sources. We denote by A° the 
eigenvalues of C°, with {A° , a = I,..., m} non zero and A° = for a = m + 1, N. The 
associated eigenvectors {v°,a = 1,...,N} form an orthonormal basis. If all the eigenvalues 
of C° are different (hence in particular N = m + 1), then, at first order, the eigenvalues of 
C are 

A Q = A° + e 2 Aq 
Ai = < T CV° (a = l,...,iV), (30) 



and the corresponding eigenvectors are 



0T/-il o 

£ v ?l^/ (a = l,-, AO- (31) 



v ° _|_ e 2 „0 V " ^ V /3 



If there are degenerate eigenvalues (in particular the null eigenvalue is degenerate for N > 
m + 1), this is modified as follows. Suppose C° has only r < N different eigenvalues, H\ > 
H2 > ••• > fJ> r , with degeneracies q a , a = 1, r (J2 a Qa — N, fi r = if > m + 1). We have 



A?, — 



a— 1 a 

Ha for ^2 q b < a < q b = a a (32) 

6=1 6=1 

and we set a = 0. Consider an eigenvalue fx a with degeneracy q a > 1. The eigenvectors of 
C° associated to /x a , {v°,a a _! < a < a a }, form an orthonormal basis of this eigenspace of 
dimension q a , and this base is defined up to an arbitrary orthogonal transformation. This 
arbitrariness is removed at first non trivial order in e, together with the removal of the 
eigenvalue degeneracy: the new q a eigenvalues for {a a -i < a < a a } are given by Eq. fl30|). 
where the v°'s form the particular q a x q a orthogonal matrix which diagonalizes C^, the 
restriction of the matrix C 1 to the eigenspace of fi a , the A„ being then the eigenvalues of C^. 

The eigenvectors v are now given by an equation similar to (|3l|) , with the sum over (3 ^ a 
replaced by a sum over the (3 such that \p ^ X a , and a new term specific to each degenerate 
eigenvalue /i a : 

v 0Tpl o 
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+ e 2 £ X Qi/3 v° (a = l,...,JV), (33) 

where the v° are chosen as just explained, and X a> p is an arbitrary antisymmetric matrix. 

The final result is thus that the space generated by the m eigenvectors associated to the 
m largest eigenvalues is, to order e 2 , the same space as the one which would be obtained in 
the absence of the weak sources. Projecting the data onto this space is then equivalent to 
working with the m-dimensional signal which is the mixture of the m strong sources, weakly 
corrupted by an additive noise. 



5 BSS with noisy data 

Let us now assume that we have pre-processed the data by projecting it onto the m largest 
principal components. To avoid the introduction of a new notation, in the following {Sj,j = 
1, ...,m} will denote these preprocessed data (projections) instead of the data themselves. 
Instead of the model Eq.(^) we have thus to consider the model 

m 

Sj = X] M jaS a + v], 3 = 1, m. (34) 

a=l 

The matrix M is now a m x m invertible mixture matrix, such that MM T has m non 
zero, of order 1 = e°, eigenvalues. The s Q 's (a = 1, ...,m) are the sources of interest, and 
the i/j's are additive noises, resulting from the weak sources, as explained in the previous 
section. This noise v§ = = 1, ...,m} is uncorrelated with the m (strong) sources, and 

of arbitrary distribution P(uq). Since we are working in the small e regime, all we will need 
is to characterize this distribution by its first two cumulants: 

< u >= 

< uqvI >= e 2 B , (35) 

where B is a (possibly non diagonal) m x m symmetric matrix. The problem we are consid- 
ering now is thus strictly the same as the one of performing BSS on a linear mixture of m 
sources corrupted by some additive input noise, which, although small, cannot be neglected. 



5.1 The Mutual Information 

In this section we consider this noisy BSS problem within the infomax approach as formulated 
in [IC]. The network we consider has the same architecture as the one defined in Eq. fllED, 
but with m inputs and outputs: 

Vi = fi(hi) + v % (36) 

m 

hi = Yl J ij ( s j + u j ) i = 1, m, (37) 

3=1 

with < Vi >= b Siji. The limit to be considered here is the one of a vanishing output 
noise, b — > 0, but at a given input noise level: 

< b « e 2 . (38) 
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Another important difference with the calculation done in section [372], is that here we are 
interested in computing the information conveyed about the global input, S + u , and not 



about the "pure" signal alone S. Indeed, in section [T2] we considered some input noise 
corresponding to some noise at the level of the receptors, whereas here the actual signal is 
the global input, S + u , in which we have decided to call "(pure) signal" the part coming 
from the strong sources and "noise" the part due to the weak sources. 

In this limit of vanishing output noise, the mutual information /(V, S + u ) between the 
output and the input of the network is up to a constant equal to the output entropy. To 
simplify the analysis, we assume a full adaptation of the transfer functions, which means [ [Tcfl , 
for J given, 

fl(hi) = ii>i(hi),i = l,...,m , (39) 
where ifji(hi) is the marginal probability distribution of the PSP h{. As a result the mutual 



information is up to a constant equal to the redundancy between the PSP's [ITU 



/(v,s) = c^ ( .-/ ( rh W i„ fi |^. (40) 

5.2 Maximization in the small e limit 

In term of the sources distributions, the distribution ^(h) is given by: 

/m . m 

n ds « p a (s a ) / d m u p(u ) n s(hi - £[jm] jq Sa - yi (4i) 
a=l i=l a j 

Since in Eq. fl4"Ip the noises are ~ 0(e) we can perform an expansion, leading to the 
following expression: 

^(h) = { 1 + t- £ [JBJ T ] .., did, } V°(h) , (42) 

where di means the partial derivative with respect to hi, and ^°(h) is the p.d.f. that would 
be obtained at e = 0. Because the noise has zero mean there is no term of order e in (|42j). 

We consider now the maximization of the mutual information over the choice of J, taking 
into account that e is small. If e was strictly zero, we would be back to the noiseless BSS 
problem for which the optimum is reached for J = M _1 (up to an arbitrary permutation). 
So for nonzero e we write 

W = JM = l m + e W 1 + 0(e 2 ) , (43) 

where l m is the m x m identity matrix, and the correction is a matrix of order at least e. 
Since W depends now on e we can also expand ip° in powers of e, and finally if) (hi) can then 
be written as 



^(h) 

with 



X\pa(h 



Q[h] + R[h] ] (44) 



Q[h] = - Et ln PJ' WZphf, - TrW 1 (45) 

a.,/3 
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and R[h] contains terms of order at least e 2 , coming from both W, equation (f43|), and B, 
equation fl42"|) . Similarly, for the marginal distributions: 

i> a {K) = p a {K) {1 + e Qa[h a ] + R a [h a ] } , (46) 

with 

Q a [h a ] = - [\np a ]'W^h a - W l aa . (47) 

The substitution of Eq.(|44]) and ( f4"B"l) in the expression ( f40l) gives then for the mutual infor- 
mation, at first non trivial order: 



J(V,S) = / (V,S) - j J f[dh a p a (h 

a=l 



Q[h]-J2Q a [ho 



(48) 



The term Iq(V, S) corresponds to the part of the mutual information which does not take 
into account the weak sources. It is the same as if one computes the mutual information 
between the output V and the signal Ms; /(V, Ms). The fact that there is no term of order 
e in (|48 ) can be understood as coming from the normalization conditions / dhifi°(h) = 1 and 
/ dh a ipa = 1, which imply 



/■in 
II dh « Pctiha) Q[h] = 

Of=l 



and 



dh a p a {h a ) Q a [h c 







(these properties can be easily checked by performing the integrations using the explicit 



expressions (|45| ) and (0)). One has similar properties for the quantities of order e 2 , R[h] 
and R a [h a ] defined in ( ^4[) and fl46|) , so that they do not contribute at this order e 2 in the 



final result (IB 



Now one has 



Q[h]-J2Qa[h c 



Y[lnp a ]'W'h 



(49) 



The mutual information is maximized when the quadratic term in (|4*8|) is minimized, that is 
for W\q = for a ^ f3. It follows that there is no correction to the mutual information at 
order e 2 and that corrections due to the weak sources appear at order e 4 . 



6 Numerical simulations 

In this section we illustrate our analysis by numerical simulations. We test the above analysis 
on the following toy example. We consider the ICA of natural images performed in || . First 
we reproduce the results in 0] (not shown here). We then create a new data base with 
artificially increased component strengths: new images are computed as a linear mixture 
of the previous ICA basis function but the strength of 20 components was augmented 100 
times compared to the other 124. We performed ICA in this new data base, with the same 
algorithm based on infomax [II], [|, but after projecting the data onto the 20 largest principal 



12 



components. The resulting basis function represented on the Figure 1 shows the efficiency of 
PCA preprocessing: we find the good 20 stronger components and the computational time 
is considerably decreased. 

For such a signal, the PCA analysis is identical to a Fourier analysis, and therefore 
dropping the smallest eigenvalues means neglecting high frequencies. One thus expect to 
extract components which are smoothed versions of components extracted when working 
with the full space. This is indeed the shown on Figure 1. 

7 Concluding remarks 

We have discussed the task of Blind Source Separation in the case of a mixture of sources of 
unequal strengths. 

We have presented different, but related, ways of defining the relative strengths of the 
sources. In particular, when non zero input noise is taken into account the contribution of 
a source to the conveyed information can be characterized by a criterion which combines 
the mixture matrix elements and the third cumulant of the source distribution. This allows 
to define the strength of a source once a proper normalization of the mixture matrix is 
assumed. Conversely, this study shows which sources will be "preferred" by the infomax 
criterion (which part of the signal is more likely to be well extracted by an ICA performed 
with infomax). 

The analysis indicates also that, although arbitrary, the assumed normalization of the 
mixture matrix may have an important practical role in the analysis of the outcome of an 
ICA, whenever one wants to extract the "meaningful" sources. Which part of the signal is 
more important is of course an application dependent notion. Prior knowledge related to a 
given case should allow to define the proper normalization from which the appropriate scale 
of source strengths can be defined. Conversely each chosen normalization implies a particular 
physical interpretation which should be kept in mind when analyzing the outcome of an ICA. 

We have considered with more details the particular case of the information processing 
of a linear mixture of independent sources when some of them are very weak as compared 
to the other sources. One should note that in such case the notion of strong versus weak 
is independent of the mixture matrix normalization. It is easily seen that the presence of 
weak sources leads to an almost singular mixture matrix, and this manifests itself by the 
existence of very small eigenvalues in the PCA analysis. We have shown that it is relevant to 
project the input data onto the largest principal components in order to extract the strongest 
independent sources. We have thus quantified the intuitive idea that the subspace, where 
most of the data live, is mainly spanned by the strongest independent sources. We illustrated 
this result on the ICA of the image data base studied in ||. 

A possible situation where the PCA will not be (sufficiently) helpful is when the strong 
sources generate a linear space of dimension smaller than the number of sources. This space 
will be found by the PCA. After projection onto the largest PC's, one has then to deal with 
an ICA with a number of sources larger than the number of captors. This is an interesting 
problem which has received considerable attention recently, and several algorithms have been 
proposed. Our analysis suggests then that it can be meaningfull to project onto the largest 
PC's (in order to eliminate the weak sources) and yet to search for a number of (strong) IC's 
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larger than the number of largest PC's. 
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Figure Captions 



Figure 1. Basis functions of the ICA solution 
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